Implementing Performance Libraries on Graphics Hardware
نویسندگان
چکیده
We propose a simple method to implement floating-point vector math operations and matrix multiplication on graphics hardware, focusing on identification of details, in both software and hardware, which affect performance and ease of use. Before widespread adoption of the graphics processing unit (GPU) as another computation processor, we must address the need of application interfaces (APIs) that abstract away the details of the implementation. We focus on providing an interface to the hardware that utilizes high level interfaces that hide the specifics of implementing the functionality on the GPU, while maintaining performance. We then use this interface to implement non-negative matrix factorization, used for performing feature extraction, to demonstrate the strengths of the library when run on current graphics hardware.
منابع مشابه
Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...
متن کاملA Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications
In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, lo...
متن کاملScalable Rendering on PC Clusters
This paper presents initial results from research targeted at the development of cost-effective scalable visualization and rendering technologies. The implementations of two 3D graphics libraries based on the popular sort-last and sort-first parallel rendering techniques are discussed. An important goal of these implementations is to provide scalable rendering capability for extremely large dat...
متن کاملAutomatic Tuning Matrix Multiplication Performance on Graphics Hardware By
Graphics hardware’s performance is advancing much faster than the performance of conventional microprocessor. In order to utilize the tremendous computing power of these systems, it is critical to tune software to graphics hardware’s architectural features. The frequent changes in GPUs’ architecture and performance characteristics makes it very desirable for such tuning to be automated. This pa...
متن کاملOpenCL Evaluation for Numerical Linear Algebra Library Development
With the help of of CUDA [7], [6], many applications improved their performance by using GPUs. In our project called Matrix Algebra on GPU and Multicore Architectures (MAGMA) [10], we mainly focus on dense linear algebra routines similar to those from LAPACK [1]. Other than CUDA, there exist other frameworks that allow platformindependent programming for GPUs. The main three frameworks are: 1) ...
متن کامل